Generating a script based on user actions

ABSTRACT

Embodiments of a computer system, a method and a computer-program product (e.g., software) for use with the computer system are described. These embodiments may allow users to create or maintain (including repair and validation), which collect user information from a web page. In particular, using a web browser extension, one or more users&#39; actions while navigating or interacting with the web page are captured. This captured user-action information may specify the layout of the web page, including data locations and/or types of data. Then, using the captured user-action information, a new scraping script can be generated or, based at least in part on determined changes to the web page, an existing scraping script can be maintained.

BACKGROUND

The present invention relates to techniques for generating a script toscrape a web page based on actions of a user while navigating throughthe web page.

Online account systems (such as Internet Banking) are increasinglypopular. These websites allow easy access to account balance andtransaction information for a single online merchant or financialinstitution. However, for a complete understanding of an individual's ora business' financial position, data from multiple merchants andfinancial institutions may need to be aggregated.

Several existing web services provide aggregated financial informationthat is collected from online accounts using Open Financial Exchange(OFX). However, not all of the websites that host online accountssupport OFX. To address this problem, a central server can be used toaggregate the financial information. Using customer credentialinformation (such as a username and password) to login to an onlineaccount on a website, this server can collect or scrape the appropriatedata from the returned formatted web pages, and thus, can aggregate thefinancial information.

Typically, the financial information is collected from websites usingscraping scripts. A scraping script usually includes commands that parseand interact with one or more web pages via a network, such as theInternet. For a scraping script to function properly, it is typicallydesigned based on the details of a given web page (such as the web pageflows to login and access data), so that the relevant customer financialinformation can be located and collected. For example, a scriptingengineer may manually analyze the financial institution's web page todetermine the sequence of commands needed to navigate and obtainspecific data from this web page. Therefore, creating a scraping scriptcan be time-consuming and expensive.

In addition, if the financial institution modifies a particular websiteand/or if there are changes to a customer's online account, a scrapingscript may not function correctly. When this occurs, a scriptingengineer typically has to access the website to duplicate the exactproblem that the server encountered, and then update the scraping scriptaccordingly. This process is also expensive and can be time-consuming.

SUMMARY

One embodiment of the present invention relates to a computer systemthat generates a script. During operation, the computer system receivesuser-action information that was captured during a session where a useraccessed a web page. This captured user-action information includesinformation about how the user navigated through the web page during thesession, a layout of the web page, and data locations on the web page.Then, the computer system generates the script based at least in part onthe captured user-action information, where the script is configured toexecute on the computer system to scrape information from the web pagewithout user intervention, and where generating the script involvestranslating the captured user-action information into executableoperations.

In some embodiments, generating the script involves determining changesto the web page based at least in part on the captured user-actioninformation and revising an existing script based at least in part onthe determined changes. This may be useful for web pages that areoccasionally modified.

Note that the user-action information may have been captured by asoftware application that executes in a virtual environment of a webbrowser. In some embodiments, prior to receiving the user-actioninformation, the computer system receives a request for the softwareapplication from the user and, in response to the request, provides thesoftware application.

The software application may avoid capturing sensitive information toassure the user that there is no risk associated with the capture of theuser-action information. Consequently, the user-action information mayexclude credential information provided by the user through the web pageduring the session.

Furthermore, the user-action information may include metadata associatedwith the data locations. This metadata may specify types of data.

Additionally, the user-action information may include one or more eventsin which the user communicated information with a host system that hoststhe web page. During at least one of the one or more events, the usermay have provided data to the host system, for example, by selecting anitem in a menu or by typing information into a field.

Note that the user-action information may include informationcorresponding to at least a portion of a hierarchical structure of theweb page (however, in other embodiments the entire web page may becaptured). This hierarchical structure may specify the layout of the webpage and the data locations. For example, the hierarchical structure mayinclude a set of nodes corresponding to an eXtensible markup language(XML) path.

In some embodiments, the computer system repeats the receiving operationfor multiple users in multiple sessions. Then, using the captureduser-information from the multiple users, the computer system maygenerate the script.

Another embodiment provides a method including at least some of theabove-described operations.

Another embodiment provides a computer-program product for use inconjunction with the computer system.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a block diagram illustrating a web page in accordance with anembodiment of the present invention.

FIG. 2 is a flow chart illustrating a process for generating a script inaccordance with an embodiment of the present invention.

FIG. 3 is a block diagram illustrating a networked computer system thatgenerates and executes a script in accordance with an embodiment of thepresent invention.

FIG. 4 is a block diagram illustrating a computer system that generatesand executes a script in accordance with an embodiment of the presentinvention.

FIG. 5 is a block diagram illustrating a data structure in accordancewith an embodiment of the present invention.

Table 1 provides an illustration of an eXtensible markup language (XML)path (XPath) for a web-page element and corresponding macro commands ina scraping script.

Note that like reference numerals refer to corresponding partsthroughout the drawings.

DETAILED DESCRIPTION

The following description is presented to enable any person skilled inthe art to make and use the invention, and is provided in the context ofa particular application and its requirements. Various modifications tothe disclosed embodiments will be readily apparent to those skilled inthe art, and the general principles defined herein may be applied toother embodiments and applications without departing from the spirit andscope of the present invention. Thus, the present invention is notintended to be limited to the embodiments shown, but is to be accordedthe widest scope consistent with the principles and features disclosedherein.

Embodiments of a computer system, a method and a computer-programproduct (e.g., software) for use with the computer system are described.These embodiments may allow users to create or maintain (includingrepair and validation) scraping scripts, which collect user informationfrom a web page. In particular, using a web browser extension, one ormore users' actions while navigating or interacting with the web pageare captured. This captured user-action information may specify thelayout of the web page, including data locations and/or types of data.Then, using the captured user-action information, a new scraping scriptcan be generated or, based at least in part on determined changes to theweb page, an existing scraping script can be maintained.

This scripting technique can reduce the time and expense associated withcreating new scraping scripts or maintaining existing scraping scripts.Consequently, this scripting technique may reduce the costs of softwareproviders that use scraping scripts. In addition, by helping to maintainthe freshness of scraping scripts, this scripting technique can improvethe satisfaction and productivity of users of the software provider'sproducts because the ability of these products to aggregate userfinancial information (for subsequent use in these products) may be morereliable.

We now describe embodiments of a process for generating a scrapingscript. In the discussion that follows, ‘generating’ should beunderstood to include creating a new scraping script and/or maintainingan existing scraping script.

FIG. 1 presents a block diagram illustrating a web page 100, such as aweb page associated with a financial institution. This web page includesmultiple page elements 110 (such as data fields) in which users canprovide information (for example, a page element may be a text box) orselect information (for example, a page element may be a pull-downmenu). When creating a new scraping script, a scripting engineertypically needs to determine the locations of web-page elements 110 onweb page 100. In addition, the scripting engineer typically needs todetermine the types of data associated with web-page elements 110. Thesetasks are often accomplished by detailed examination of the HyperTextMarkup Language (HTML) of web page 100. Similarly, when maintaining anexisting scraping script (such as when a problem occurs during scrapingof web page 100), the scripting engineer may need to: reproduce theproblem, fix the problem by modifying the scraping script, and thenverify that the fix works reliably.

In the discussion that follows, a scripting technique that significantlyreduces the effort needed to generate the scraping script is described.For example, the time needed to update an existing scraping script maybe reduced by 50-75%. As an illustration of this scripting technique,one or more users of financial software agree to allow their actionswhile accessing their financial accounts at one or more financialinstitutions to be captured by a provider of the financial software.Using the captured user-action information from at least one user, theprovider of the financial software can generate the scraping script.

In particular, the one or more users may agree to download a web-browserextension (such as a Mozilla Firefox extension or an Internet Explorer™Browser Helper Object), or more generally a software application, thatrecords the way the one or more users navigate through and interact witha website of a given financial institution (which includes one or moreweb pages) while accessing their financial accounts during one or moresessions. (Note that the one or more users may selectively enable ordisable the web-browser extension.)

Then, the web-browser extension may upload the captured user-actioninformation to a web service offered by the provider of the financialsoftware via a network. This captured user-action information includes asequence of events and tasks that allow the scraping script to begenerated, for example, by software (such as a generating module) thatmimics the actions of the one or more users. Moreover, the captureduser-action information can be stored for use in generating otherscraping scripts in the future. In this way, multiple additional users(who did not agree to download the web-browser extension) can benefitfrom the scraping script.

In general, the scraping script may be executed on a different computersystem (or server) than the computer system that the web-browserextension executes on. For example, the web-browser extension mayexecute on a client computer, while the scraping script may execute on acentral server. To facilitate this, generating the scraping script mayinvolve translating the captured user-action information intocorresponding commands or operations that can execute on the server.However, in other embodiments, note that the scraping script may executeon another web-browser extension on the client computer.

FIG. 2 presents a flow chart illustrating a process 200 for generating ascript (such as a scraping script), which may be performed by a computersystem. During operation, the computer system receives user-actioninformation that was captured during a session where a user accessed aweb page (212). This captured user-action information includesinformation about how the user navigated through the web page during thesession, a layout of the web page, and data locations on the web page.Then, the computer system generates the script based at least in part onthe captured user-action information (216), where the script isconfigured to execute on the computer system to scrape information fromthe web page without user intervention, and where generating the scriptinvolves translating the captured user-action information intoexecutable operations. This script may be used to scrape information forthis user and/or for other users (such as users that have providedcredential information).

In some embodiments, generating the script involves determining changesto the web page based at least in part on the captured user-actioninformation and revising an existing script based at least in part onthe determined changes.

Note that the user-action information may have been captured by asoftware application (such as the web-browser extension) that executesin a virtual environment of a web browser. In some embodiments, prior toreceiving the user-action information, the computer system optionallyreceives a request for the software application from the user and, inresponse to the request, provides the software application (210).

In some embodiments, the computer system optionally repeats (214) thereceiving operation for multiple users in multiple sessions. Then, usingthe captured user-information from the multiple users, the computersystem may generate the script (216). This may be useful because whenfixing a problem with the script (such as a problem associated with onetype of financial account), a new problem may occur with a differenttype of financial account. By capturing and using user-actioninformation from multiple users to generate the script (216), thelikelihood of causing additional problems may be reduced or eliminated.

In some embodiments of process 200, there may be additional or feweroperations. Moreover, the order of the operations may be changed and/ortwo or more operations may be combined into a single operation.

In an exemplary embodiment, the captured user-action informationincludes a sequence of one or more events (which is sometimes referredto as an event file) in which the given user highlighted or provideddata in one or more of web-page elements 110 (FIG. 1). These events mayor may not include communication with a host system for the website(such as a server or computer system). For example, during some events,the given user clicks on a submit icon or button on web page 100 (FIG.1), which communicates information with the host computer. (This type ofevent is sometimes referred to as a form post.) However, during otherevents, the given user selects a choice in a pull-down menu, which doesnot require communication with the host system. In either case, theweb-browser extension may capture: user actions, the order of the useractions, the location of web-page elements 110 in FIG. 1 (e.g., HTMLcode associated with web page 100 in FIG. 1), and/or what the given usertyped and/or clicked on. Note that the user-action information mayinclude metadata associated with web-page elements 110 (FIG. 1) (such asdata fields and what they mean), which may specify types of dataassociated with one or more of web-page elements 110 (FIG. 1).

In some embodiments, the web-browser extension may avoid capturingsensitive information. Consequently, the user-action information mayexclude credential information provided by the user through a websiteduring a given session.

Furthermore, the captured user-action information may specify at least aportion of a hierarchical structure of web page 100 (FIG. 1). Thishierarchical structure may specify the layout of web page 100 (FIG. 1)and web-page elements 110 (FIG. 1). For example, the hierarchicalstructure may include a set of nodes corresponding to an eXtensiblemarkup language (XML) path (XPath) of web-page elements 110 (FIG. 1).

Each of the user actions (such as a form post) may be subsequentlytranslated into an equivalent macro command, which may be pre-populatedwith any web-page-element information (such as a form name, etc.).Furthermore, for a given web-page element that the user accessed, theassociated XPath may be broken down into a corresponding macro sequencethat allows this web-page element to be reached. This is illustrated inTable 1. In this example, an XPath to the third row in a table on a webpage is converted into the listed macro commands. Collectively, thecombination of macro commands results in a scraping script that cannavigate and download specific information from web page 100 (FIG. 1).

TABLE 1 XPath: //HTML/BODY[1]/TABLE[1]/TBODY[1]/TR[2]/TD[1] Macrocommands: ,MoveToTag,″HTML″, ,MoveToTag,″BODY″, ,MoveToTable,,,MoveToTableRow,, ,MoveToTableRow,,

We now describe embodiments of a computer system that performs process200. FIG. 3 presents a block diagram illustrating a networked computersystem 300 that generates and executes a scraping script. In thiscomputer system, a user of computer 310 may use financial software. Thisfinancial software may be a stand-alone application or a portion ofanother application that is resident on and which executes on computer310. Alternatively and/or additionally, at least a portion of thefinancial software may be a financial-software application tool(provided by server 314 via network 312) that is embedded in a web page(and which executes in a virtual environment of a web browser). In anillustrative embodiment, the software-application tool is a softwarepackage written in: JavaScript™ (a trademark of Sun Microsystems, Inc.),e.g., the software-application tool includes programs or procedurescontaining JavaScript instructions, ECMAScript (the specification forwhich is published by the European Computer Manufacturers AssociationInternational), VBScript™ (a trademark of Microsoft, Inc.) or any otherclient-side scripting language. In other words, the embeddedsoftware-application tool may include programs or procedures containing:JavaScript, ECMAScript instructions, VBScript instructions, orinstructions in another programming language suitable for rendering bythe web browser or another client application on computer 310.

A provider of the financial software may generate scraping scripts,which are resident on and which execute on server 314. These scrapingscripts may collect user financial-account information from one or morefinancial institutions, such as credit-card provider 316, brokerage 318and/or bank 320. For example, a given scraping script may collect theuser financial-account information by accessing one or more web pagesand/or websites of the one or more financial institutions.

Furthermore, the collected user information may be used by the financialsoftware. For example, the collected user financial-account informationmay be pre-filled into forms in the financial software, thereby makingit easier for the user to use the financial software.

To assist the provider of the financial software, the user may requestand install a web-browser extension on computer 310. For example, inresponse to the request, server 314 may provide the web-browserextension to computer 310 via network 312. As discussed previously, whenenabled by the user, the web-browser extension may capture user-actioninformation during a session in which the user accesses one or morefinancial accounts and navigates through one or more web pages of theone or more financial institutions.

Subsequently, the captured user-action information (such as one or moreevent files) may be uploaded by the web-browser extension to server 314via network 312. This captured user-action information may be used tocreate one or more new scraping scripts and/or to maintain the existingscraping script.

Note that the collected user financial-account information and/or thecaptured user-action information may be stored on server 314 and/or atone or more other locations in computer system 300 (i.e., locally orremotely). Moreover, because this information may be sensitive innature, it may be encrypted. For example, stored information and/orinformation communicated via network 312 may be encrypted.

Computers and servers in computer system 300 may include one of avariety of devices capable of manipulating computer-readable data orcommunicating such data between two or more computing systems over anetwork, including: a personal computer, a laptop computer, a mainframecomputer, a portable electronic device (such as a cellular phone orPDA), a server and/or a client computer (in a client-serverarchitecture). Moreover, network 312 may include: the Internet, WorldWide Web (WWW), an intranet, LAN, WAN, MAN, or a combination ofnetworks, or other technology enabling communication between computingsystems.

In exemplary embodiments, the financial software includes: Quicken™and/or TurboTax™ (from Intuit, Inc., of Mountain View, Calif.),Microsoft Money™ (from Microsoft Corporation, of Redmond, Wash.),SplashMoney™ (from SplashData, Inc., of Los Gatos, Calif.), Mvelopes™(from In2M, Inc., of Draper, Utah), and/or open-source applications suchas Gnucash™, PLCash™, Budget™ (from Snowmint Creative Solutions, LLC, ofSt. Paul, Minn.), and/or other planning software capable of processingfinancial information.

Moreover, the financial software may include software such as:QuickBooks™ (from Intuit, Inc., of Mountain View, Calif.), Peachtree™(from The Sage Group PLC, of Newcastle Upon Tyne, the United Kingdom),Peachtree Complete™ (from The Sage Group PLC, of Newcastle Upon Tyne,the United Kingdom), MYOB Business Essentials™ (from MYOB US, Inc., ofRockaway, N.J.), NetSuite Small Business Accounting™ (from NetSuite,Inc., of San Mateo, Calif.), Cougar Mountain™ (from Cougar MountainSoftware, of Boise, Id.), Microsoft Office Accounting™ (from MicrosoftCorporation, of Redmond, Wash.), Simply Accounting™ (from The Sage GroupPLC, of Newcastle Upon Tyne, the United Kingdom), CYMA IV Accounting™(from CYMA Systems, Inc., of Tempe, Ariz.), DacEasy™ (from Sage SoftwareSB, Inc., of Lawrenceville, Ga.), Microsoft Money™ (from MicrosoftCorporation, of Redmond, Wash.), and/or other payroll or accountingsoftware capable of processing payroll information.

FIG. 4 presents a block diagram illustrating a computer system 400 thatgenerates and/or executes a scraping script. Computer system 400includes one or more processors 410, a communication interface 412, auser interface 414, and one or more signal lines 422 coupling thesecomponents together. Note that the one or more processing units 410 maysupport parallel processing and/or multi-threaded operation, thecommunication interface 412 may have a persistent communicationconnection, and the one or more signal lines 422 may constitute acommunication bus. Moreover, the user interface 414 may include: adisplay 416, a keyboard 418, and/or a pointer 420, such as a mouse.

Memory 424 in the computer system 400 may include volatile memory and/ornon-volatile memory. More specifically, memory 424 may include: ROM,RAM, EPROM, EEPROM, flash memory, one or more smart cards, one or moremagnetic disc storage devices, and/or one or more optical storagedevices. Memory 424 may store an operating system 426 that includesprocedures (or a set of instructions) for handling various basic systemservices for performing hardware-dependent tasks. Memory 424 may alsostore procedures (or a set of instructions) in a communication module428. These communication procedures may be used for communicating withone or more computers and/or servers, including computers and/or serversthat are remotely located with respect to the computer system 400.

Memory 424 may also include multiple program modules (or sets ofinstructions), including: web browser 430 (or a set of instructions),web-browser extension 432 (or a set of instructions), generating module438 (or a set of instructions), optional encryption module 444 (or a setof instructions) and/or financial software 446 (or a set ofinstructions). Note that one or more of these program modules (or setsof instructions) may constitute a computer-program mechanism.

When requested by a user, web-browser extension 432 may be installed ona user's computer. This web-browser extension 432 may execute in avirtual environment of web browser 430 (which may also be installed onthe user's computer). Web-browser extension may capture user-actioninformation while the user accesses one or more financial accounts andnavigates through one or more web pages of one or more financialinstitutions 442.

Subsequently, the captured user-action information may be provided tocomputer system 400. For example, captured user-action information forone or more users may be stored in event files 434, such as event filesfor session A 436-1 and session B 436-2.

Next, generating module 438 may use the captured user-action informationto generate one or more scraping scripts 440. These scraping scripts maybe executed on computer system 400 to collect user financial-accountinformation 448, which may be used by financial software 446 (forexample, to fill in forms for one or more users). Note that one or morescraping scripts 440 may execute without direct user control. Forexample, a scraping script may execute (using stored user credentialinformation) without a user request.

In some embodiments, at least some of the information stored in memory424 and/or at least some of the information communicated usingcommunication module 428 is encrypted using optional encryption module444.

In some embodiments, the scraping script is generated on a separatecomputer system than the one on which it executes. In these embodiments,financial software 446 is on a separate computer system from generatingmodule 438. However, in other embodiments the scraping script isgenerated on and executes on the same computer system.

Instructions in the various modules in the memory 424 may be implementedin: a high-level procedural language, an object-oriented programminglanguage, and/or in an assembly or machine language. Note that theprogramming language may be compiled or interpreted, e.g., configurableor configured, to be executed by the one or more processing units 410.

Although the computer system 400 is illustrated as having a number ofdiscrete items, FIG. 4 is intended to be a functional description of thevarious features that may be present in the computer system 400 ratherthan a structural schematic of the embodiments described herein. Inpractice, and as recognized by those of ordinary skill in the art, thefunctions of the computer system 400 may be distributed over a largenumber of servers or computers, with various groups of the servers orcomputers performing particular subsets of the functions. In someembodiments, some or all of the functionality of the computer system 400may be implemented in one or more application-specific integratedcircuits (ASICs) and/or one or more digital signal processors (DSPs).

Computer systems 300 (FIG. 3) and/or 400 may include fewer components oradditional components. Moreover, two or more components may be combinedinto a single component, and/or a position of one or more components maybe changed. In some embodiments, the functionality of the computersystem 400 may be implemented more in hardware and less in software, orless in hardware and more in software, as is known in the art.

We now discuss a data structure that may be used in computer systems 300(FIG. 3) and 400. FIG. 5 presents a block diagram illustrating a datastructure 500. This data structure may include information for eventfiles 510, such as event files 510-1 and 510-2. For example, event file510-1 may include: one or more web pages 512-1 accessed by one or moreusers, session 514-1 information, one or more user actions 516-1, one ormore web-page elements 518-1 the one or more users accessed, one or moreXPaths 520-1 associated with web-page elements 518-1, and/or metadata522-1 associated with web-page elements 518-1. Note that event file510-1 may also include a sequential order of user actions 516-1.

In some embodiments of data structure 500, there may be fewer oradditional components. For example, data structure 500 may include theHTML code for the one or more web pages 512-1. Moreover, two or morecomponents may be combined into a single component and/or a position ofone or more components may be changed.

While a scraping script for use with websites associated with financialinstitutions has been used as an illustrating example in the precedingdiscussion, in other embodiments the scripting technique may be used tocreate or maintain a scraping script for use with web pages or websitesassociated with a wide variety of organizations (including pharmacies,healthcare providers and/or health-insurance companies), as well as withtypes of user accounts other than financial accounts.

Furthermore, the scraping script may also be used for other purposes,such as user identity validation and/or user authentication. Forexample, using user provided credential information, the scraping scriptmay attempt to access a user account via the website of a financialinstitution. If successful, this process may confirm the user's identityand authorization to perform financial transactions.

The foregoing descriptions of embodiments of the present invention havebeen presented for purposes of illustration and description only. Theyare not intended to be exhaustive or to limit the present invention tothe forms disclosed. Accordingly, many modifications and variations willbe apparent to practitioners skilled in the art. Additionally, the abovedisclosure is not intended to limit the present invention. The scope ofthe present invention is defined by the appended claims.

What is claimed is:
 1. A computer-implemented method for generating ascript, comprising: retrieving a script for scraping information from awebsite of a financial institution, wherein the scraped information isassociated with a financial account at the financial institution, andwherein the script is configured to execute on a computer system toscrape information from the website without user intervention; receivinguser-action information that was captured during a first session where afirst user traversed one or more pages of a website of a financialinstitution while accessing the first user's financial account at thefinancial institution, wherein the received user-action informationincludes information about how the first user traversed the one or morepages of the website during the first session, a layout of the traversedone or more pages of the website, and data locations on the traversedone or more pages of the website; translating the received user-actioninformation into executable operations to perform the receiveduser-actions on the traversed one or more pages of the website withoutuser intervention; determining changes to the traversed one or morepages of the website relative to a version of the one or more pages ofthe website used to generate the retrieved script based at least in parton the received user-actions; automatically revising the retrievedscript based at least in part on the determined changes and receiveduser-action information to resolve a first problem with the scriptassociated with the determined changes; replacing the retrieved scriptwith the revised script; authenticating to a web server that hosts thewebsite of the financial institution with credentials of a second userfor accessing the second user's financial account at the financialinstitution; retrieving the revised script; executing the revised scripton the computer system, thereby causing the computer system to navigatethe website on behalf of the second user; scrape new information for thesecond user from the second user's financial account at the financialinstitution; determining, based on user-action information of the seconduser captured during a second session where the second user traversedthe one or more pages of the website while accessing the second user'sfinancial account at the financial institution, a second set of changesto the website undetected from the user-action information capturedduring the first session; automatically generating a second revisedscript based on the revised script, the determined second set ofchanges, and the user-action information of the second user; andreplacing the revised script with the second revised script.
 2. Themethod of claim 1, wherein the user-action information captured duringthe first session and the user-action information captured during thesecond session was captured by a software application that executes in avirtual environment of a web browser.
 3. The method of claim 2, whereinprior to receiving the user-action information captured during the firstsession of the first user, the method further comprises: receiving arequest for the software application from the first user; and inresponse to the request, providing the software application.
 4. Themethod of claim 1, wherein the user-action information captured duringthe first session excludes credential information provided by the firstuser through the web page during the first session.
 5. The method ofclaim 1, wherein the user-action information captured during the firstsession and the user-action information captured during the secondsession includes metadata associated with the data locations; andwherein the metadata specifies types of data.
 6. The method of claim 1,wherein the user-action information captured during the first sessionand the user-action information captured during the second sessionincludes one or more events in which the user communicated informationwith a host system that hosts the web page.
 7. The method of claim 6,wherein, during at least one of the one or more events, the first userprovided data to the host system.
 8. The method of claim 1, wherein theuser-action information captured during the first session and theuser-action information captured during the second session includesinformation corresponding to at least a portion of a hierarchicalstructure of the web page; and wherein at least the portion of thehierarchical structure specifies the layout of the web page and the datalocations.
 9. The method of claim 8, wherein the hierarchical structureincludes a set of nodes corresponding to an eXtensible Markup Language(XML) path.
 10. The method of claim 1, wherein the receiving operationis repeated for multiple users in multiple sessions and the generatingoperation is based at least in part on captured user-information fromthe multiple users.
 11. A computer-program product for use inconjunction with a computer system, the computer-program productcomprising a non-transitory computer-readable storage medium and acomputer-program mechanism embedded therein for configuring the computersystem to generate a script, the computer-program mechanism including:instructions for retrieving a script for scraping information from awebsite of a financial institution, wherein the scraped information isassociated with a financial account at the financial institution, andwherein the script is configured to execute on a computer system toscrape information from the website without user intervention;instructions for receiving user-action information that was capturedduring a first session where a first user traversed one or more pages ofa website of a financial institution while accessing the first user'sfinancial account at the financial institution, wherein the receiveduser-action information includes information about how the first usertraversed the one or more pages of the website during the first session,a layout of the traversed one or more pages of the website, and datalocations on the traversed one or more pages of the website;instructions for translating the received user-action information intoexecutable operations to perform the received user-actions on thetraversed one or more pages of the website without user intervention;instructions for determining changes to the traversed one or more pagesof the website relative to a version of the one or more pages of thewebsite used to generate the retrieved script based at least in part onthe received user-actions; instructions for automatically revising theretrieved script based at least in part on the determined changes andreceived user-action information to resolve a first problem with thescript associated with the determined changes; instructions forreplacing the retrieved script with the revised script; instructions forauthenticating to a web server that hosts the website of the financialinstitution with credentials of a second user for accessing the seconduser's financial account at the financial institution; instructions forretrieving the revised script; instructions for executing the revisedscript on the computer system, thereby causing the computer system tonavigate the website on behalf of the second user, instructions forscraping new information for the second user from the second user'sfinancial account at the financial institution; instructions fordetermining, based on user-action information of the second usercaptured during a second session where the second user traversed the oneor more pages of the website while accessing the second user's financialaccount at the financial institution, a second set of changes to thewebsite undetected from the user-action information captured during thefirst session; instructions for automatically generating a secondrevised script based on the revised script, the determined second set ofchanges, and the user-action information of the second user; andinstructions for replacing the revised script with the second revisedscript.
 12. The computer-program product of claim 11, wherein theuser-action information captured during the first session and theuser-action information captured during the second session was capturedby a software application that executes in a virtual environment of aweb browser.
 13. The computer-program product of claim 12, wherein priorto receiving the user-action information captured during the firstsession, the computer-program product further comprises: instructionsfor receiving a request for the software application from the firstuser; and in response to the request, instructions for providing thesoftware application.
 14. The computer-program product of claim 11,wherein the user-action information captured during the first sessionand the user-action information captured during the second sessionincludes metadata associated with the data locations; and wherein themetadata specifies types of data.
 15. The computer-program product ofclaim 11, wherein the user-action information captured during the firstsession and the user-action information captured during the secondsession includes one or more events in which the first user communicatedinformation with a host system that hosts the web page.
 16. Thecomputer-program product of claim 15, wherein, during at least one ofthe one or more events, the first user provided data to the host system.17. The computer-program product of claim 11, wherein the user-actioninformation captured during the first session and the user-actioninformation captured during the second session includes informationcorresponding to at least a portion of a hierarchical structure of theweb page; and wherein at least the portion of the hierarchical structurespecifies the layout of the web page and the data locations.
 18. Thecomputer-program product of claim 11, wherein the receiving operation isrepeated for multiple users in multiple sessions and the generatingoperation is based at least in part on captured user-information fromthe multiple users.
 19. A computer system, comprising: a processor; andmemory having instructions thereon which, when executed by theprocessor, performs an operation: retrieving a script for scrapinginformation from a website of a financial institution, wherein thescraped information is associated with a financial account at thefinancial institution, and wherein the script is configured to executeon a computer system to scrape information from the website without userintervention, receiving user-action information that was captured duringa first session where a first user traversed one or more pages of awebsite of a financial institution while accessing the first user'sfinancial account at the financial institution, wherein the receiveduser-action information includes information about how the first usertraversed the one or more pages of the website during the first session,a layout of the traversed one or more pages of the website, and datalocations on the traversed one or more pages of the website, translatingthe received user-action information into executable operations toperform the received user-actions on the traversed one or more pages ofthe website without user intervention, determining changes to thetraversed one or more pages of the website relative to a version of theone or more pages of the website used to generate the retrieved scriptbased at least in part on the received user-actions, automaticallyrevising the retrieved script based at least in part on the determinedchanges and received user-action information to resolve a first problemwith the script associated with the determined changes, replacing theretrieved script with the revised script, authenticating to a web serverthat hosts the website of the financial institution with credentials ofa second user for accessing the second user's financial account at thefinancial institution, retrieving the revised script, executing therevised script on the computer system, thereby causing the computersystem to navigate the website on behalf of the second user, scrape newinformation for the second user from the second user's financial accountat the financial institution, determining, based on user-actioninformation of the second user captured during a second session wherethe second user traversed the one or more pages of the website whileaccessing the second user's financial account at the financialinstitution, a second set of changes to the website undetected from theuser-action information captured during the first session, automaticallygenerating a second revised script based on the revised script, thedetermined second set of changes, and the user-action information of thesecond user, and replacing the revised script with the second revisedscript.